Pandas 101 - Weather Analysis at LGA, JFK, and EWR Airports

blogging
jupyter
Python
Back to Basics
Author

Kashish Mukheja

Published

Wednesday, 24 January 2024

A Report by …

@Author: Kashish Mukheja

Introduction

In this report, we analyze weather data from three major airports in the New York City area (LGA, JFK, and EWR) to determine monthly mean wind speeds and identify any outliers.

This report presents an analysis of monthly mean wind speeds at LGA, JFK, and EWR airports using historical weather data. The data is fetched from nycflights13_weather.csv file present publically on the internet. 1. We start with importing the data and adjusting the columns to the proper format in their SI unit. 2. After preprocessing the dataset and addressing outliers, we calculate the monthly mean wind speeds for each airport. 3. Subsequently, we visualize the monthly mean wind speeds for all three airports on a single plot, providing insights into wind speed variations over time.

The analysis aims to provide valuable information for stakeholders involved in aviation, urban planning, and renewable energy sectors.

Data Preparation

To begin, let’s import the necessary libraries, and load the data from the provided CSV file and examine its structure

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Load the data
df_weather = pd.read_csv('nycflights13_weather.csv')

# Display the first few rows of the DataFrame
df_weather.head()
origin year month day hour temp dewp humid wind_dir wind_speed wind_gust precip pressure visib time_hour
0 EWR 2013 1 1 0 37.04 21.92 53.97 230.0 10.35702 11.918651 0.0 1013.9 10.0 2013-01-01 1:00:00
1 EWR 2013 1 1 1 37.04 21.92 53.97 230.0 13.80936 15.891535 0.0 1013.0 10.0 2013-01-01 2:00:00
2 EWR 2013 1 1 2 37.94 21.92 52.09 230.0 12.65858 14.567241 0.0 1012.6 10.0 2013-01-01 3:00:00
3 EWR 2013 1 1 3 37.94 23.00 54.51 230.0 13.80936 15.891535 0.0 1012.7 10.0 2013-01-01 4:00:00
4 EWR 2013 1 1 4 37.94 24.08 57.04 240.0 14.96014 17.215830 0.0 1012.8 10.0 2013-01-01 5:00:00

Data Conversion to Metric Units

This section converts temperature, precipitation, visibility, and wind speed measurements from imperial to metric (SI) units. This is crucial to maintain uniformity and consistency at a global level for better interpretability. We convert the following columns:

  1. temp and dewp to Celsius

  2. precip to millimetres

  3. visib to metres

  4. wind_speed and wind_gust to metres per second

Note: We will replace all data in-place i.e., overwrite existing columns with new ones

Let’s start with converting the temperature from Fahrenheit to Celsius:

# Convert temperature from Fahrenheit to Celsius
df_weather['temp'] = (df_weather['temp'] - 32) * 5/9
df_weather['dewp'] = (df_weather['dewp'] - 32) * 5/9

Next, we’ll convert precipitation from inches to millimeters:

# Convert precipitation from inches to millimeters
df_weather['precip'] = df_weather['precip'] * 25.4

Then, we’ll convert visibility from miles to meters:

# Convert visibility from miles to meters
df_weather['visib'] = df_weather['visib'] * 1609.34

Now, let’s convert wind speed and gust from miles per hour to meters per second:

# Convert wind speed and gust from miles per hour to meters per second
df_weather['wind_speed'] = df_weather['wind_speed'] * 0.44704
df_weather['wind_gust'] = df_weather['wind_gust'] * 0.44704

Finally, let’s correct the time_hour column by shifting it back by one hour:

# Correct the time_hour column by shifting it back by one hour
df_weather['time_hour'] = pd.to_datetime(df_weather['time_hour']) - pd.Timedelta(hours=1)
df_weather.head(10)
origin year month day hour temp dewp humid wind_dir wind_speed wind_gust precip pressure visib time_hour
0 EWR 2013 1 1 0 2.8 -5.6 53.97 230.0 4.630002 5.328114 0.0 1013.9 16093.4 2013-01-01 00:00:00
1 EWR 2013 1 1 1 2.8 -5.6 53.97 230.0 6.173336 7.104152 0.0 1013.0 16093.4 2013-01-01 01:00:00
2 EWR 2013 1 1 2 3.3 -5.6 52.09 230.0 5.658892 6.512139 0.0 1012.6 16093.4 2013-01-01 02:00:00
3 EWR 2013 1 1 3 3.3 -5.0 54.51 230.0 6.173336 7.104152 0.0 1012.7 16093.4 2013-01-01 03:00:00
4 EWR 2013 1 1 4 3.3 -4.4 57.04 240.0 6.687781 7.696165 0.0 1012.8 16093.4 2013-01-01 04:00:00
5 EWR 2013 1 1 6 3.9 -3.3 59.37 270.0 4.630002 5.328114 0.0 1012.0 16093.4 2013-01-01 06:00:00
6 EWR 2013 1 1 7 3.9 -2.8 61.63 250.0 3.601113 4.144089 0.0 1012.3 16093.4 2013-01-01 07:00:00
7 EWR 2013 1 1 8 3.9 -2.2 64.43 240.0 5.144447 5.920127 0.0 1012.5 16093.4 2013-01-01 08:00:00
8 EWR 2013 1 1 9 4.4 -2.2 62.21 250.0 5.658892 6.512139 0.0 1012.2 16093.4 2013-01-01 09:00:00
9 EWR 2013 1 1 10 3.9 -2.2 64.43 260.0 5.658892 6.512139 0.0 1011.9 16093.4 2013-01-01 10:00:00

Daily Mean Wind Speed Calculation for LGA Airport

We now move further in our journey of exploring this data by calculating the average wind speed for each day at the LGA airport, based on the provided weather data. We will group the data by year, month and day.

# Filter data for LGA airport
df_lga = df_weather[df_weather['origin'] == 'LGA']

# Convert time_hour column to datetime using .loc
df_lga.loc[:, 'time_hour'] = pd.to_datetime(df_lga['time_hour'])

# Group data by year, month, and day and calculate daily mean wind speed
daily_mean_wind_speed = df_lga.groupby([df_lga['time_hour'].dt.year, 
                                        df_lga['time_hour'].dt.month, 
                                        df_lga['time_hour'].dt.day])['wind_speed'].mean()

# Display the daily mean wind speeds
print(daily_mean_wind_speed)
time_hour  time_hour  time_hour
2013       1          1            6.687781
                      2            6.430559
                      3            4.908660
                      4            6.880698
                      5            5.144447
                                     ...   
           12         26           3.301020
                      27           5.401669
                      28           4.672873
                      29           3.794030
                      30           6.001855
Name: wind_speed, Length: 364, dtype: float64

Visualization of Daily Mean Wind Speed at LGA Airport

This section presents a plot illustrating the daily mean wind speeds recorded at the LGA airport over time. The x-axis displays dates in a human-readable format for better interpretation. We use the daily_mean_wind_speed computed in the previous section to write the code for visualisation.

# Convert the index to datetime
daily_mean_wind_speed.index = pd.to_datetime(daily_mean_wind_speed.index.map(lambda x: '-'.join(map(str, x))))

# Create a plot
plt.figure(figsize=(10, 6))
plt.plot(daily_mean_wind_speed.index, daily_mean_wind_speed.values, color='blue', marker='o', linestyle='-')
plt.title('Daily Mean Wind Speed at LGA Airport')
plt.xlabel('Date')
plt.ylabel('Wind Speed (m/s)')
plt.grid(True)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Top Ten Windiest Days at LGA Airport

This section creates a DataFrame containing the dates and corresponding wind speeds for the top ten windiest days at LGA airport, based on the provided weather data.

To identify the ten windiest days at LGA airport, we’ll first calculate the daily mean wind speeds for all days. Then, we’ll sort the data to find the ten days with the highest mean wind speeds.

# Calculate daily mean wind speeds for LGA airport
daily_mean_wind_speed = df_lga.groupby([df_lga['time_hour'].dt.date])['wind_speed'].mean()

# Sort the data to identify the ten windiest days
top_ten_windiest_days = daily_mean_wind_speed.nlargest(10)

Let’s create a DataFrame with the date and corresponding wind speeds for the top ten windiest days

# Create a DataFrame with date and wind speed
top_ten_df = pd.DataFrame({'date': top_ten_windiest_days.index, 'wind_speed': top_ten_windiest_days.values})

# Print the DataFrame
print(top_ten_df)
         date  wind_speed
0  2013-11-24   11.317783
1  2013-01-31   10.717598
2  2013-02-17   10.010236
3  2013-02-21    9.192903
4  2013-02-18    9.174264
5  2013-03-14    9.109958
6  2013-11-28    8.938477
7  2013-05-26    8.852736
8  2013-05-25    8.766995
9  2013-02-20    8.659819

Monthly Mean Wind Speeds for All Three Airports

This section calculates and presents the monthly mean wind speeds for all three airports (LGA, JFK, and EWR), after addressing outliers in the data. The results are displayed in a DataFrame for easy interpretation and analysis.

To compute the monthly mean wind speeds for all three airports and address the outlier in the data, the following steps are followed:

  1. Filter the data for all three airports: LGA, JFK, and EWR.
  2. Replace the outlier with np.nan (NaN).
  3. Calculate the monthly mean wind speeds for each airport.
# Filter data for all three airports
df_all_airports = df_weather[(df_weather['origin'] == 'LGA') | (df_weather['origin'] == 'JFK') | (df_weather['origin'] == 'EWR')]

# Identify and replace the outlier with np.nan (NaN)
outlier_index = df_all_airports['wind_speed'].idxmax()  # Find the index of the outlier
df_all_airports.loc[outlier_index, 'wind_speed'] = np.nan  # Replace the outlier with NaN

# Calculate monthly mean wind speeds for all three airports
monthly_mean_wind_speeds = df_all_airports.groupby([df_all_airports['time_hour'].dt.year,
                                                    df_all_airports['time_hour'].dt.month,
                                                    df_all_airports['origin']])['wind_speed'].mean()

# Display the result
print(monthly_mean_wind_speeds)
time_hour  time_hour  origin
2013       1          EWR       4.328121
                      JFK       5.378474
                      LGA       5.068977
           2          EWR       4.731659
                      JFK       5.954831
                      LGA       5.522218
           3          EWR       5.144447
                      JFK       6.209389
                      LGA       5.848862
           4          EWR       4.265604
                      JFK       5.584480
                      LGA       4.931524
           5          EWR       3.709672
                      JFK       4.600270
                      LGA       4.197841
           6          EWR       4.202727
                      JFK       4.884367
                      LGA       4.462808
           7          EWR       4.022527
                      JFK       4.498271
                      LGA       4.181334
           8          EWR       3.349452
                      JFK       4.305163
                      LGA       3.734075
           9          EWR       3.574639
                      JFK       4.359919
                      LGA       3.934073
           10         EWR       3.650740
                      JFK       4.615364
                      LGA       4.568659
           11         EWR       4.599020
                      JFK       5.752821
                      LGA       5.416842
           12         EWR       3.911640
                      JFK       4.861502
                      LGA       4.526399
Name: wind_speed, dtype: float64

Let’s create a DataFrame with the year, month, airport and the corresponding monthly wind speeds. This will help to view the data in a more readable format.

# Create a DataFrame for monthly mean wind speeds
monthly_mean_wind_speeds_df = monthly_mean_wind_speeds.to_frame(name='monthly_mean_wind_speed')

# Rename index levels for better readability
monthly_mean_wind_speeds_df.index.names = ['year', 'month', 'airport']

# Display the DataFrame
monthly_mean_wind_speeds_df
monthly_mean_wind_speed
year month airport
2013 1 EWR 4.328121
JFK 5.378474
LGA 5.068977
2 EWR 4.731659
JFK 5.954831
LGA 5.522218
3 EWR 5.144447
JFK 6.209389
LGA 5.848862
4 EWR 4.265604
JFK 5.584480
LGA 4.931524
5 EWR 3.709672
JFK 4.600270
LGA 4.197841
6 EWR 4.202727
JFK 4.884367
LGA 4.462808
7 EWR 4.022527
JFK 4.498271
LGA 4.181334
8 EWR 3.349452
JFK 4.305163
LGA 3.734075
9 EWR 3.574639
JFK 4.359919
LGA 3.934073
10 EWR 3.650740
JFK 4.615364
LGA 4.568659
11 EWR 4.599020
JFK 5.752821
LGA 5.416842
12 EWR 3.911640
JFK 4.861502
LGA 4.526399

EDA - Monthly Mean Wind Speeds Comparison

This section compares the monthly mean wind speeds for LGA, JFK, and EWR airports, plotting the data on the same graph. The x-axis represents the date, and the y-axis represents the mean wind speed in meters per second. Each airport is differentiated by a distinct color, with a legend provided for clarity.

# Reset index to use year and month as separate columns
monthly_mean_wind_speeds_df.reset_index(inplace=True)

# Create a plot
plt.figure(figsize=(10, 6))
for airport in monthly_mean_wind_speeds_df['airport'].unique():
    data = monthly_mean_wind_speeds_df[monthly_mean_wind_speeds_df['airport'] == airport]
    plt.plot(data['year'].astype(str) + '-' + data['month'].astype(str), data['monthly_mean_wind_speed'], label=airport)

# Add labels and legend
plt.title('Monthly Mean Wind Speeds for LGA, JFK, and EWR Airports')
plt.xlabel('Date')
plt.ylabel('Mean Wind Speed (m/s)')
plt.legend()
plt.grid(False)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Conclusion

In conclusion, our analysis reveals distinct patterns in monthly mean wind speeds at LGA, JFK, and EWR airports. While all three airports experience fluctuations in wind speeds throughout the year, certain months exhibit higher wind speeds compared to others.

By identifying and visualizing these trends, stakeholders can better understand the wind climate at each airport and make informed decisions regarding infrastructure development, flight operations, and energy generation projects. Additionally, the identification and handling of outliers ensure the accuracy and reliability of the analysis results.

Moving forward, further research could explore the impact of wind speed variations on specific sectors and evaluate potential mitigation strategies for adverse

Thank You for Reading the Report

Back to top